AITopics | Wayne

Collaborating Authors

Wayne

Generate-then-Verify: Reconstructing Data from Limited Published Statistics

Liu, Terrance, Xiao, Eileen, Smith, Adam, Thaker, Pratiksha, Wu, Zhiwei Steven

arXiv.org Machine LearningJun-12-2025

We study the problem of reconstructing tabular data from aggregate statistics, in which the attacker aims to identify interesting claims about the sensitive data that can be verified with 100% certainty given the aggregates. Successful attempts in prior work have conducted studies in settings where the set of published statistics is rich enough that entire datasets can be reconstructed with certainty. In our work, we instead focus on the regime where many possible datasets match the published statistics, making it impossible to reconstruct the entire private dataset perfectly (i.e., when approaches in prior work fail). We propose the problem of partial data reconstruction, in which the goal of the adversary is to instead output a $\textit{subset}$ of rows and/or columns that are $\textit{guaranteed to be correct}$. We introduce a novel integer programming approach that first $\textbf{generates}$ a set of claims and then $\textbf{verifies}$ whether each claim holds for all possible datasets consistent with the published aggregates. We evaluate our approach on the housing-level microdata from the U.S. Decennial Census release, demonstrating that privacy violations can still persist even when information published about such data is relatively sparse.

artificial intelligence, household, householder, (15 more...)

arXiv.org Machine Learning

2504.21199

Country:

North America > United States > Michigan > Wayne County > Wayne (0.04)
North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Thesis: Document Summarization with applications to Keyword extraction and Image Retrieval

Sundararaj, Jayaprakash

arXiv.org Artificial IntelligenceMay-20-2024

Automatic summarization is the process of reducing a text document in order to generate a summary that retains the most important points of the original document. In this work, we study two problems - i) summarizing a text document as set of keywords/caption, for image recommedation, ii) generating opinion summary which good mix of relevancy and sentiment with the text document. Intially, we present our work on an recommending images for enhancing a substantial amount of existing plain text news articles. We use probabilistic models and word similarity heuristics to generate captions and extract Key-phrases which are re-ranked using a rank aggregation framework with relevance feedback mechanism. We show that such rank aggregation and relevant feedback which are typically used in Tagging Documents, Text Information Retrieval also helps in improving image retrieval. These queries are fed to the Yahoo Search Engine to obtain relevant images 1. Our proposed method is observed to perform better than all existing baselines. Additonally, We propose a set of submodular functions for opinion summarization. Opinion summarization has built in it the tasks of summarization and sentiment detection. However, it is not easy to detect sentiment and simultaneously extract summary. The two tasks conflict in the sense that the demand of compression may drop sentiment bearing sentences, and the demand of sentiment detection may bring in redundant sentences. However, using submodularity we show how to strike a balance between the two requirements. Our functions generate summaries such that there is good correlation between document sentiment and summary sentiment along with good ROUGE score. We also compare the performances of the proposed submodular functions.

keyword, submodular function, summarization, (15 more...)

arXiv.org Artificial Intelligence

2406.00013

Country:

South America > Argentina (0.04)
North America > United States > Michigan > Wayne County > Wayne (0.04)
Asia > India > Maharashtra > Mumbai (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Clustering US Counties to Find Patterns Related to the COVID-19 Pandemic

Brown, Cora, Milstein, Sarah, Sun, Tianyi, Zhao, Cooper

arXiv.org Artificial IntelligenceMar-19-2023

When COVID-19 first started spreading and quarantine was implemented, the Society for Industrial and Applied Mathematics (SIAM) Student Chapter at the University of Minnesota-Twin Cities began a collaboration with Ecolab to use our skills as data scientists and mathematicians to extract useful insights from relevant data relating to the pandemic. This collaboration consisted of multiple groups working on different projects. In this write-up we focus on using clustering techniques to help us find groups of similar counties in the US and use that to help us understand the pandemic. Our team for this project consisted of University of Minnesota students Cora Brown, Sarah Milstein, Tianyi Sun, and Cooper Zhao, with help from Ecolab Data Scientist Jimmy Broomfield and University of Minnesota student Skye Ke. In the sections below we describe all of the work done for this project. In Section 2, we list the data we gathered, as well as the feature engineering we performed. In Section 3, we describe the metrics we used for evaluating our models. In Section 4, we explain the methods we used for interpreting the results of our various clustering approaches. In Section 5, we describe the different clustering methods we implemented. In Section 6, we present the results of our clustering techniques and provide relevant interpretation. Finally, in Section 7, we provide some concluding remarks comparing the different clustering methods.

artificial intelligence, cluster 0, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2303.11936

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Michigan > Wayne County > Wayne (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)
(26 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Epidemiology (0.86)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.63)
Health & Medicine > Therapeutic Area > Immunology (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Graph Attention Networks Unveil Determinants of Intra- and Inter-city Health Disparity

Liu, Chenyue, Fan, Chao, Mostafavi, Ali

arXiv.org Artificial IntelligenceOct-20-2022

Understanding the determinants underlying variations in urban health status is important for informing urban design and planning, as well as public health policies. Multiple heterogeneous urban features could modulate the prevalence of diseases across different neighborhoods in cities and across different cities. This study examines heterogeneous features related to socio-demographics, population activity, mobility, and the built environment and their non-linear interactions to examine intra- and inter-city disparity in prevalence of four disease types: obesity, diabetes, cancer, and heart disease. Features related to population activity, mobility, and facility density are obtained from large-scale anonymized mobility data. These features are used in training and testing graph attention network (GAT) models to capture non-linear feature interactions as well as spatial interdependence among neighborhoods. We tested the models in five U.S. cities across the four disease types. The results show that the GAT model can predict the health status of people in neighborhoods based on the top five determinant features. The findings unveil that population activity and built-environment features along with socio-demographic features differentiate the health status of neighborhoods to such a great extent that a GAT model could predict the health status using these features with high accuracy. The results also show that the model trained on one city can predict health status in another city with high accuracy, allowing us to quantify the inter-city similarity and discrepancy in health status. The model and findings provide novel approaches and insights for urban designers, planners, and public health officials to better understand and improve health disparities in cities by considering the significant determinant features and their interactions.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2210.10142

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > United States > Arkansas > Cross County (0.05)
North America > United States > New York > Queens County > New York City (0.04)
(10 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.87)

Industry:

Health & Medicine > Public Health (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.59)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Anomaly Detection for High-Dimensional Data Using Large Deviations Principle

Guggilam, Sreelekha, Chandola, Varun, Patra, Abani

arXiv.org Machine LearningSep-28-2021

Most current anomaly detection methods suffer from the curse of dimensionality when dealing with high-dimensional data. We propose an anomaly detection algorithm that can scale to high-dimensional data using concepts from the theory of large deviations. The proposed Large Deviations Anomaly Detection (LAD) algorithm is shown to outperform state of art anomaly detection methods on a variety of large and high-dimensional benchmark data sets. Exploiting the ability of the algorithm to scale to high-dimensional data, we propose an online anomaly detection method to identify anomalies in a collection of multivariate time series. We demonstrate the applicability of the online algorithm in identifying counties in the United States with anomalous trends in terms of COVID-19 related cases and deaths. Several of the identified anomalous counties correlate with counties with documented poor response to the COVID pandemic.

dataset, detection, time sery, (11 more...)

arXiv.org Machine Learning

2109.13698

Country:

North America > United States > New York > Erie County > Buffalo (0.04)
North America > United States > Michigan > Wayne County > Wayne (0.04)
North America > United States > Wyoming > Albany County > Laramie (0.04)
(10 more...)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.88)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Lightweight Data Fusion with Conjugate Mappings

Dean, Christopher L., Lee, Stephen J., Pacheco, Jason, Fisher, John W. III

arXiv.org Machine LearningNov-20-2020

We present an approach to data fusion that combines the interpretability of structured probabilistic graphical models with the flexibility of neural networks. The proposed method, lightweight data fusion (LDF), emphasizes posterior analysis over latent variables using two types of information: primary data, which are well-characterized but with limited availability, and auxiliary data, readily available but lacking a well-characterized statistical relationship to the latent quantity of interest. The lack of a forward model for the auxiliary data precludes the use of standard data fusion approaches, while the inability to acquire latent variable observations severely limits direct application of most supervised learning methods. LDF addresses these issues by utilizing neural networks as conjugate mappings of the auxiliary data: nonlinear transformations into sufficient statistics with respect to the latent variables. This facilitates efficient inference by preserving the conjugacy properties of the primary data and leads to compact representations of the latent variable posterior distributions. We demonstrate the LDF methodology on two challenging inference problems: (1) learning electrification rates in Rwanda from satellite imagery, high-level grid infrastructure, and other sources; and (2) inferring county-level homicide rates in the USA by integrating socio-economic data using a mixture model of multiple conjugate mappings.

auxiliary data, inference, primary data, (12 more...)

arXiv.org Machine Learning

2011.10607

Country:

Africa > Rwanda (0.25)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(20 more...)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Economy (0.65)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.34)

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
(2 more...)

Add feedback

AICov: An Integrative Deep Learning Framework for COVID-19 Forecasting with Population Covariates

Fox, Geoffrey C., von Laszewski, Gregor, Wang, Fugang, Pyne, Saumyadipta

arXiv.org Machine LearningOct-8-2020

The COVID-19 pandemic has profound global consequences on health, economic, social, political, and almost every major aspect of human life. Therefore, it is of great importance to model COVID-19 and other pandemics in terms of the broader social contexts in which they take place. We present the architecture of AICov, which provides an integrative deep learning framework for COVID-19 forecasting with population covariates, some of which may serve as putative risk factors. We have integrated multiple different strategies into AICov, including the ability to use deep learning strategies based on LSTM and even modeling. To demonstrate our approach, we have conducted a pilot that integrates population covariates from multiple sources. Thus, AICov not only includes data on COVID-19 cases and deaths but, more importantly, the population's socioeconomic, health and behavioral risk factors at a local level. The compiled data are fed into AICov, and thus we obtain improved prediction by integration of the data to our model as compared to one that only uses case and death data.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2010.03757

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Washington (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Ford replaces CEO Mark Fields in push to transform business

Associated PressMay-22-2017, 15:51:45 GMT

FILE - In this April 12, 2017 file photo, Ford Motor Co. President and CEO Mark Fields speaks during a media preview of the 2018 Lincoln Navigator at the New York International Auto Show in New York. Ford is replacing its CEO amid questions about its current performance and future strategy, a person familiar with the situation has said. Fields will be replaced by Jim Hackett, who joined Ford's board in 2013. FILE - In this April 12, 2017 file photo, Ford Motor Co. President and CEO Mark Fields speaks during a media preview of the 2018 Lincoln Navigator at the New York International Auto Show in New York. Ford is replacing its CEO amid questions about its current performance and future strategy, a person familiar with the situation has said.

artificial intelligence, ford, hackett, (13 more...)

Associated Press

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Michigan > Wayne County > Wayne (0.05)
North America > United States > Michigan > Wayne County > Dearborn (0.05)
(6 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Automobiles & Trucks > Manufacturer (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.31)

Add feedback

Trump tweets himself praise as Ford dumps plan for Mexico plant, looks to hire more in Michigan

The Japan TimesJan-3-2017, 21:50:17 GMT

WASHINGTON – Ford scuttled a plan to build a new factory in Mexico Tuesday following criticism from Donald Trump, and just hours after the president-elect attacked General Motors for importing Mexican-made cars into the US. Following months of criticism from Trump for its investments in Mexico, Ford said it was spiking a plan to build a new $1.6 billion plant in San Luis Potosi, and would instead invest $700 million over the next four years to expand its Flat Rock Assembly Plant in Michigan to build electric and self-driving vehicles. Ford chief executive Mark Fields said the second-biggest U.S. automaker was hopeful Trump's policies will boost the U.S. manufacturing environment. "It's literally a vote of confidence around some of the pro-growth policies that he has been outlining and that's why we're making this decision to invest here in the U.S. and our plant here in Michigan," Fields told CNN. Earlier, GM became the latest multinational to end up in Trump's line of fire -- via Twitter as usual -- with the president-elect threatening to impose a tariff on GM's imports of a small number of Mexican-made Chevy Cruze cars to the U.S. Trump took to Twitter again to crow about the Ford reversal.

artificial intelligence, social media, trump, (16 more...)

The Japan Times

Country:

South America > Bolivia > Potosí Department > Tomás Frías Province > Potosí (0.25)
North America > Mexico > San Luis Potosí (0.25)
North America > Canada (0.18)
(6 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Government > Foreign Policy (1.00)
(3 more...)

Technology:

Information Technology > Communications > Social Media (0.81)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.56)

Add feedback

Ford cancels Mexico factory and will invest in Michigan in 'vote of confidence' for Trump plans

Los Angeles TimesJan-3-2017, 19:45:08 GMT

Ford Motor Co. said Tuesday it was scrapping plans to build a $1.6-billion factory in Mexico and would invest $700 million to expand a Michigan plant to build electric and autonomous vehicles that will add 700 jobs there in a move Ford's chief executive said was a "vote of confidence" in the economic policies of President-elect Donald Trump. Ford isn't abandoning expanded production in Mexico. The company said that to "improve company profitability" it would build its next-generation Ford Focus at an existing plant in Hermosillo, Mexico. But in the wake of criticism by President-elect Donald Trump of the U.S. automaker and other companies moving manufacturing jobs across the border, Ford said it would cancel its plans for a major new plant in San Luis Potosi, Mexico. A company news release didn't mention Trump, but Chief Executive Mark Fields told CNN on Tuesday that the new plans were "a vote of confidence" in the direction of the U.S. economy under the president-elect.

artificial intelligence, press release, vehicle, (11 more...)

Los Angeles Times

Country:

South America > Bolivia > Potosí Department > Tomás Frías Province > Potosí (0.26)
North America > Mexico > Sonora > Hermosillo (0.26)
North America > Mexico > San Luis Potosí (0.26)
(3 more...)

Genre: Press Release (0.59)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Automobiles & Trucks > Manufacturer (1.00)
Government > Regional Government > North America Government > United States Government (0.95)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.39)

Add feedback